Back

European Heart Journal - Digital Health

Oxford University Press (OUP)

Preprints posted in the last 90 days, ranked by how well they match European Heart Journal - Digital Health's content profile, based on 15 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Artificial Intelligence for Cardiac Biomarkers After Myocardial Infarction: A Systematic Review and a Leakage-Aware Modeling Framework

Piorkowska, N. J.; Olejnik, A.; Madeyski, L.; Musz, A.; Kuliczkowski, W.; Mysiak, A.; Zyłka, A.; Bil-Lula, I.

2026-04-29 cardiovascular medicine 10.64898/2026.04.28.26351958 medRxiv
Top 0.1%
28.0%
Show abstract

AimsTo systematically evaluate how artificial intelligence and machine-learning (AI/ML) methods are applied to cardiac biomarkers after myocardial infarction (MI), identify recurring methodological limitations, and operationalize a leakage-aware modelling workflow in a proof-of-concept post-MI dataset using a controlled proxy classification task. Methods and resultsA PRISMA 2020-compliant systematic review of studies published between 2015 and 2025 identified 120 eligible studies from 1,389 records. Most studies used multimodal inputs combining biomarkers with clinical or functional variables (109/120, 90.8%) and focused on prediction or prognostic modelling (89/120, 74.2%). Logistic or regularized regression (76/120, 63.3%) and Random Forest (69/120, 57.5%) were the most frequently used approaches. Internal validation predominated, whereas independent external validation was reported in only 44/120 studies (36.7%). Area under the receiver operating characteristic curve (ROC-AUC) was reported in 114/120 studies (95.0%), while calibration analyses and decision-curve analysis remained limited. Formal explainability methods were used inconsistently, and public code availability was uncommon. To translate these observations into a practical framework, we implemented a leakage-aware machine-learning workflow in a proof-of-concept dataset of 152 patients with MI and 117 variables. The analytical task was defined as a binary classification problem (STEMI vs NSTEMI), used intentionally as a methodological proxy rather than a clinically relevant prognostic endpoint. Three predefined feature-set variants were benchmarked using nested cross-validation. The FULL variant achieved near-perfect discrimination [ROC-AUC 0.9988 (95% CI 0.9925-1.000)], the CLINICAL variant showed modest performance [0.6025 (0.4463-0.7450)], and the BIOMARKERS variant yielded strong discrimination with low dimensionality [0.9300 (0.8537-0.9863)]. Permutation-based falsification testing reduced performance towards chance level, supporting the procedural integrity of the workflow. ConclusionsAI/ML research on cardiac biomarkers after MI is expanding rapidly but remains limited by heterogeneous methodology, insufficient external validation, incomplete interpretability, and weak reproducibility practices. A leakage-aware framework integrating explicit feature governance, nested validation, calibration assessment, robustness analyses, and falsification testing may improve the credibility and translational relevance of biomarker-based cardiovascular AI studies. However, the proof-of-concept case study is intended as a methodological demonstration and does not represent prognostic modelling of post-MI outcomes. Translational PerspectiveAI models using cardiac biomarkers after MI often report strong discrimination, but their clinical value is undermined by limited external validation, incomplete calibration assessment, and poor transparency. Our systematic review identifies these recurrent weaknesses, while the proof-of-concept case study demonstrates how a leakage-aware workflow can distinguish clinically plausible signals from structurally inflated performance under controlled analytical conditions. The use of a proxy classification task highlights methodological behavior rather than clinical prognosis, underscoring the need for future studies to validate such frameworks on clinically meaningful post-MI outcomes. Integrating explicit feature governance, nested validation, calibration, decision-analytic assessment, and falsification testing may help move biomarker-based cardiovascular AI from promising retrospective performance towards more reproducible and clinically trustworthy prediction models.

2
An ECG foundation model for generalizable cardiac function prediction across the lifespan

Yang, Y.; Peracchio, L.; Mayourian, J.; Miller, T.; La Cava, W.

2026-05-27 health informatics 10.64898/2026.05.26.26354128 medRxiv
Top 0.1%
26.8%
Show abstract

Background Artificial intelligence-enhanced electrocardiography (AI-ECG) enables scalable, low-cost cardiac dysfunction screening, but existing models are annotation-intensive and predominantly adult-derived, leaving paediatric generalizability uncertain. Paediatric cohorts exhibit highly variable cardiac morphology and function compared to adults, which may be useful for learning generalizable AI-ECG models. Methods We pretrained ECG-Fyler on a predominantly paediatric, all-age cohort at Boston Children's Hospital (1992-2023), annotated with a cardiology-specific coding system (Fyler codes), and evaluated it on assessments from echocardiography (echo) and cardiac magnetic resonance (CMR) studies. We validated on an external adult cohort from Columbia University Irving Medical Center. Performance was benchmarked against several AI-ECG foundation models by AUROC across age groups, lesion types, and limited-data scenarios. Findings The pretraining cohort comprised 782,138 ECGs from 255,271 patients (median age: 10.9 years, IQR: [2.8-16.8]). Internal evaluation included 178,495 ECG-echo pairs (median age: 10.9 [3.7-17.0]) and 8,584 ECG-CMR pairs (median age: 20.7 [15.6-29.6]). External validation included 82,543 ECG-echo pairs from adults (median age: 64.0 [52.0-74.0]). ECG-Fyler improved AUROC across biventricular dysfunction and dilation tasks, with the largest gains in low-data settings. In internal validation, ECG-Fyler detected low left ventricular ejection fraction (LVEF [&le;] 40%) from only 100 fine-tuning samples (AUROC: 0.80, 95% CI: [0.78-0.80]), outperforming other models (AUROC < 0.65) and improving with additional fine-tuning (AUROC: 0.94 [0.93-0.94]). Similar improvements were observed for CMR-derived LVEF, RVEF, and ventricular dilation. In external validation on adults, ECG-Fyler exhibited an AUROC of 0.83 (CI: [0.82-0.85]) for LVEF [&le;] 40%. After fine-tuning on less than 10% of external data, LVEF [&le;] 45% performance (AUROC: 0.87 [0.86-0.88]) outperformed a fully trained, site-specific prior model (AUROC: 0.85 [0.84-0.87]). Interpretation Pretraining on richly annotated, paediatric-dominant ECGs yields models that transfer efficiently across institutions and ages, supporting AI-ECG screening and triage when labels or imaging access are limited. Funding National Institutes of Health (R01LM012973); Kostin Innovation Fund, Boston Children's Hospital

3
Beyond Doppler: Scalable AI Detection of LVOT Obstruction in HCM

Crystal, O.; Farina, J. M. M.; Scalia, I. G.; Ayoub, C.; Park, H.-B.; Kim, K. A.; Arsanjani, R.; Lester, S. J.; Banerjee, I.

2026-04-20 cardiovascular medicine 10.64898/2026.04.17.26351151 medRxiv
Top 0.1%
25.9%
Show abstract

BackgroundAccurate assessment of left ventricular outflow tract (LVOT) gradients is critical for hypertrophic cardiomyopathy (HCM) management, yet Doppler-based measurements are technically demanding and require expertise. ObjectiveTo develop a multi-view deep learning model capable of classifying LVOT obstruction (> 20mmHg) using routine 2D echocardiographic windows without reliance on Doppler imaging. MethodsWe trained and externally validated a cross-attention-based video-to-video fusion framework that integrated EchoPrime-derived video representations from three standard transthoracic echocardiographic views to classify LVOT gradients. ResultsTraining was performed on a derivation cohort (N = 1833) from a tertiary care system in the United States, with model performance evaluated on an internal held-out test set (N = 275) and a Korean external validation cohort (N = 46). Single-view baselines showed limited discrimination (external AUROCs 0.47-0.70). Conversely, domain-specific foundational model (EchoPrime) achieved superior single-view performance (AUROCs 0.75-0.80 internal; 0.79-0.83 external), highlighting the importance of echo-specific pretraining and temporal modeling. The proposed multi-view fusion further enhanced predictive performance, with the late fusion model reaching an AUROC of 0.84 on the external cohort with significant population-shift. ConclusionsThese results suggest LVOT physiology is encoded in routine 2D imaging and can be leveraged for clinically relevant gradient classification without Doppler input- proposed AI-guided strategy demonstrates substantial cost savings compared with the screen-all approach. By integrating complementary spatial-temporal information across multiple views, our approach generalizes robustly across populations and may enable real-time decision support, extend LVOT assessment to portable or resource-limited settings, and complement Doppler-based evaluation for longitudinal HCM management.

4
Screening for patients at risk for cardiac amyloidosis via electronic health records: A multicenter machine learning development and validation study

Spielvogel, C. P.; Kersting, D.; Haberl, D.; Autherith, M.; Hauptmann, L.; Yu, J.; Hennenberg, J.; Kluge, K.; Moon, K.; Settelmeier, S.; Ning, J.; Kumpf, K.; Koefler, M.; Hofer, F.; Mascherbauer, K.; Kammerlander, A.; Traub-Weidinger, T.; Kasprian, G.; Rassaf, T.; Kleesiek, J.; Herrmann, K.; Bartko, P.; Hengstenberg, C.; Hacker, M.; Calabretta, R.; Nitsche, C.

2026-04-28 cardiovascular medicine 10.64898/2026.04.27.26351820 medRxiv
Top 0.1%
17.6%
Show abstract

BackgroundTimely detection is crucial to improve outcomes in patients with cardiac amyloidosis (CA) by initiation of life-saving treatments. Although confirmatory bone scintigraphy is highly accurate for CA detection, identifying at-risk patients for referral remains challenging. ObjectivesThis study aimed to develop and validate a machine learning model, Amylo-Detect, using structured multimodal electronic health record (EHR) data to guide referrals for confirmatory scintigraphy and monoclonal protein testing. MethodsConsecutive all-comer patients (n=11,616) referred for bone scintigraphy at the Vienna General Hospital (2010-2023) were retrospectively included. Patients referred before August 2020 formed the development cohort. The remaining patients comprised the internal validation cohort. External validation was performed at the University Hospital Essen (n=1,521). Amylo-Detect was trained using 50 routinely available parameters to predict CA-suggestive uptake (Perugini grade [&ge;]2) and compared with an existing score and clinical routine. ResultsHigh-grade uptake was present in 388 patients (3.0%). Amylo-Detect demonstrated excellent performance in development (AUC 0.93), independent internal validation (AUC 0.91), and external validation cohort (AUC 0.91), outperforming existing scoring systems and clinical routine. Results were consistent across subgroups, even when crucial predictors were missing. Of the 42/388 (10.8%) patients missed in clinical routine, 12/42 (29%) were additionally detected by Amylo-Detect. The model further conveyed significant prognostic value for mortality and heart failure hospitalization. ConclusionsWe present Amylo-Detect, a validated EHR-based tool for CA risk prediction, available as a web app, allowing application and further evaluation. By improving timely detection and referral, Amylo-Detect promises to address diagnostic delays and improve outcomes. Author summaryCardiac amyloidosis is a progressive and often fatal heart disease that is frequently diagnosed too late, even though effective treatments are now available. A major challenge is recognizing which patients should be referred for specialized testing, because early symptoms are diverse and often non-specific. In this study, we developed and validated a machine learning prediction system, called Amylo-Detect, that uses routinely collected information from electronic health records to identify patients who may be at risk for cardiac amyloidosis. We trained and externally validated the tool using data from 13,137 patients across two hospitals. We found that Amylo-Detect was highly accurate in identifying patients with disease-indicative findings on confirmatory scans and identified a substantial number of patients who were not initially suspected of having cardiac amyloidosis. Amylo-Detect, which we make publicly available via a web app, consistently outperformed existing risk scores and routine clinical decision-making. Our findings suggest that automated analysis of electronic health records can support clinicians in recognizing cardiac amyloidosis earlier, reduce diagnostic delays, and potentially improve patient outcomes by enabling timely treatment.

5
Enhanced Demographically Adaptive QT Correction Improves Pediatric Screening for Congenital Long QT Syndrome

Haq, K.; Berul, C.; Posnack, N.

2026-05-19 cardiovascular medicine 10.64898/2026.05.14.26353243 medRxiv
Top 0.1%
17.0%
Show abstract

Background: Traditional heart rate (HR) adjusted QT correction (QTc) formulae often fail to eliminate the inverse HR-QT interval relationship, particularly in pediatric patients. In this study, we optimized our previously published adaptive QTc (QTcAd) formula by including additional demographic variables and broadening the pediatric age range. We tested the hypothesis that QTcAd improves congenital long QT syndrome (congenital LQTS) detection performance and reduces erroneous classifications across pediatric cohorts. Methods: We retrospectively analyzed 8,306 ECGs from 4,556 cardiovascular disease (CVD)-free pediatric patients. For neonatal patients (1-30 days old), we derived daily QTcAd parameter values. For older patients, we developed regression models to estimate QTcAd parameters (mean Heart Rate (HR) = -15.9ln(days) + 219; |m| = 0.0001(days) + 1, where |m|=absolute HR-QT regression slope). To support LQTS screening, we constructed dynamic QTcAd thresholds by estimating age-specific reference limits. Diagnostic performance was tested in a clinically confirmed LQTS cohort (n=137), and further evaluated in the Pediatric Heart Network (PHN; n=2,394) and Emergency Department (ED; n=2,002) cohorts. Results: Using the confirmed LQTS cohort as the event population and the CVD-free cohort as the non-event population, QTcAd demonstrated higher sensitivity than QTcB (92% vs 46.7%). QTcAd maintained high specificity (96.9% vs 98.9%), which resulted in a higher Youden index (0.889 vs 0.456). In the PHN healthy cohort, both QTc formulae classified the majority of individuals as normal (QTcAd 95%; QTcB 98.2%) indicating few false-positives. In the ED cohort, QTcAd reduced borderline/prolonged QTc classifications requiring follow-up, yielding 270 fewer repeat-testing triggers than QTcB. We developed a publicly accessible calculator to compute QTcAd and classify congenital LQTS risk. Conclusion: We developed and validated an enhanced QTcAd formula for pediatric patients. QTcAd-based-age-adjusted dynamic thresholding improved performance for congenital LQTS screening, while maintaining high specificity. This reduces false-positive LQTS classifications and repeat ECGs, thereby decreasing unnecessary downstream clinical evaluation.

6
EchoAtlas: A Conversational, Multi-View Vision-Language Foundation Model for Echocardiography Interpretation and Clinical Reasoning

Chao, C.-J.; Asadi, M.; Li, L.; Ramasamy, G.; Pecco, N.; Wang, Y.-C.; Poterucha, T.; Arsanjani, R.; Kane, G. C.; Oh, J. K.; Banerjee, I.; Langlotz, C. P.; Fei-Fei, L.; Adeli, E.; Erickson, B. J.

2026-03-17 cardiovascular medicine 10.64898/2026.03.14.26348388 medRxiv
Top 0.1%
15.3%
Show abstract

Echocardiography is the most widely used cardiac imaging modality, yet artificial intelligence-enabled interpretation remains limited by the inability of existing models to integrate visual assessment, quantitative measurement, and clinical reasoning within a unified framework. Here we present EchoAtlas, the first autoregressive vision-language model developed for echocardiographic interpretation. Trained on over 12.9 million question-answer pairs derived from approximately 2 million echocardiogram videos, EchoAtlas achieves 0.966 accuracy on multiple-choice questions in our internal test set and establishes a new state-of-the-art on the public MIMIC-EchoQA benchmark (0.699 vs. 0.508 previously). EchoAtlas also provides accurate quantitative measurements, segment-level regional wall motion assessment, longitudinal comparison, and diagnostic reasoning across diverse question formats -- capabilities not previously demonstrated in this domain. These results highlight the potential of autoregressive vision-language models as a foundation for interactive echocardiographic interpretation, representing an early step toward scalable, auditable artificial intelligence systems in cardiology practice.

7
PRAM: Post-hoc Retrieval Augmentation for Parameter-Free Domain Adaptation of ICU Clinical Prediction Models

Jeong, I.; Lee, T.; Kim, B.; Park, J.-H.; Kim, Y.; Lee, H.

2026-04-05 health systems and quality improvement 10.64898/2026.04.03.26350132 medRxiv
Top 0.1%
14.9%
Show abstract

Background Clinical prediction models degrade when deployed across hospitals, yet retraining requires technical expertise, labeled data, and regulatory re-approval. We investigated whether post-hoc retrieval augmentation of a frozen model's output, analogous to retrieval-augmented methods in natural language processing, can mitigate this degradation without any parameter modification. Methods We developed the Post-hoc Retrieval Augmentation Module (PRAM), which combines predictions from a frozen base model with outcome information retrieved from similar patients in a local patient bank. Five base models (logistic regression through CatBoost) and three retrieval strategies were evaluated on 116,010 ICU patients across three databases (MIMIC-IV, MIMIC-III, eICU-CRD) for acute kidney injury (AKI) and mortality prediction. A bank size deployment simulation modeled performance from zero to full local data accumulation, complemented by source bank cold start, stress tests, and calibration experiments. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC). Results Retrieval benefit was inversely associated with base model complexity ({rho} = -0.90 for AKI, -1.00 for mortality): simpler models benefited more, consistent with retrieval capturing residual signal unexploited by the base model. PRAM showed a statistically significant monotone dose-response between bank size and prediction performance across all six outcome-target combinations (Kendall {tau} trend test, q = 0.031 for all). At the pre-specified primary comparison (bank = 5,000), the improvement was confirmed for the two largest-shift settings (eICU-CRD AKI: {Delta}AUROC = +0.012, q < 0.001; eICU-CRD mortality: {Delta}AUROC = +0.026, q < 0.001). Pre-loading a source bank bridged the cold-start gap, providing an immediate performance gain equivalent to approximately 2,000 to 5,000 local patients. Conclusions PRAM provides a parameter-free adaptation mechanism that requires no model retraining, gradient computation, or regulatory re-evaluation at the deployment site. Effect sizes were modest and did not reach cross-model superiority, but the consistent dose-response pattern and the absence of retraining requirements establish retrieval-based adaptation as a viable approach for clinical model transportability. The retrieval mechanism additionally opens a pathway toward case-based interpretability, where predictions are accompanied by identifiable similar patients from the deploying institution.

8
Prognostic value of artificial intelligence-derived echocardiographic measurements in transthyretin cardiomyopathy

Walser, A.; Flammer, A. J.; Hundertmark, M. J.; Shiri, I.; Ciocca, N.; Ryffel, C.; de Marchi, S.; Schwotzer, R.; Ruschitzka, F.; Tanner, F. C.; Graeni, C.; Benz, D. C.

2026-04-02 cardiovascular medicine 10.64898/2026.04.01.26349281 medRxiv
Top 0.1%
14.4%
Show abstract

Background: Transthyretin cardiomyopathy (ATTR-CM) is a progressive, potentially fatal disease requiring accurate risk stratification. Echocardiography is the first-line imaging modality, with AI-based tools increasingly applied for automated analysis, yet their prognostic value remains unknown. Objectives: To examine the prognostic value of AI-derived echocardiographic measurements and their incremental value beyond biomarker staging in ATTR-CM. Methods: This retrospective study included patients from two ATTR-CM registries. Baseline echocardiograms were analyzed using the fully automated AI-based software Us2.ai. Prognostic performance was assessed by Kaplan-Meier analysis, Cox regression, and ROC curves. A two-parameter echocardiographic staging system combining left ventricular (LV) global longitudinal strain (GLS) and right ventricular (RV) fractional area change (FAC) stratified patients into low (both normal), intermediate (one abnormal), and high risk (both abnormal). Results: Among 347 patients (91% male, median age 78 years), 141 experienced all-cause death or heart failure hospitalization over a median follow-up of 2.4 years. In multivariable analysis, AI-derived LV-GLS (HR 1.13 [1.03-1.25], p=0.011) and RV FAC (HR 0.96 [0.93-0.99], p=0.014) were independent outcome predictors. Echo staging stratified risk into groups with 3-fold (95% CI 1.70-5.91) and 6-fold (95% CI 3.22-10.30) increased hazard compared to low risk (p<0.001), with incremental prognostic value beyond National Amyloidosis Centre (NAC) staging and age (chi-square from 53 to 80; p<0.001). AI and human measurements showed comparable 1-year predictive performance (all p>0.05). Conclusion: AI-derived echocardiographic measurements demonstrate independent and incremental prognostic value beyond biomarker-based NAC staging in ATTR-CM, comparable to human measurements, supporting their integration into clinical risk stratification.

9
Does ECG-Based AI Detect Aortic Stenosis Beyond Conventional LVH Criteria? An Analysis of the CLIDAS Database

Shimada, T.; Kodera, S.; Sawano, S.; Guan, J.; Saitoh, W.; Wakasa, S.; Ito, S.; Yanagishita, T.; Hayashi, Y.; Shibata, A.; Ito, A.; Otsuka, K.; Higashikuni, Y.; Okamura, H.; Tsujita, K.; Node, K.; Yamaguchi, O.; Makimoto, H.; Kabutoya, T.; Imai, Y.; Nakayama, M.; Sato, H.; Fujita, H.; Kohro, T.; Matoba, T.; Takeda, N.; Fukuda, D.; Nagai, R.

2026-06-08 cardiovascular medicine 10.64898/2026.06.07.26355087 medRxiv
Top 0.1%
14.3%
Show abstract

Background: Aortic stenosis (AS) is a progressive valvular disease associated with poor prognosis once symptoms develop, yet routine echocardiographic screening is impractical. While artificial intelligence (AI)-based electrocardiogram (ECG) models have shown promise for AS detection, it remains unclear whether they primarily reflect conventional left ventricular hypertrophy (LVH) voltage criteria or capture additional ECG features. Methods and Results: We developed a deep learning model using 244,816 ECGs from 51,713 patients across six academic institutions in Japan (CLIDAS database). AS labels were derived from inpatient Diagnosis Procedure Combination (DPC) codes. The model achieved an area under the receiver operating characteristic curve (AUC) of 0.849 (95% confidence interval 0.832-0.865) in the independent test cohort, with consistent performance across institutions, sex, and age. At a threshold of 0.1, sensitivity was 79.1%, specificity was 73.9%, and negative predictive value (NPV) was 98.0%. Conventional LVH voltage criteria (Sokolow-Lyon AUC 0.706; Cornell AUC 0.692) showed lower performance, and adding them to the AI model conferred no incremental benefit (AUC 0.849 vs. 0.847). Gradient-weighted class activation mapping (Grad-CAM) revealed predominant attention around QRS complexes in limb leads, beyond regions typically assessed in LVH evaluation. Conclusions: This multicenter AI-ECG model demonstrated strong discrimination for AS and captured ECG features beyond conventional LVH voltage criteria. The high NPV supports its use as a rule-out pre-screening tool.

10
Beyond Agreement: a real-world study of the workflow gap between echocardiography and timely structural cardiac assessmentHow a Validation Study Exposed a Hidden Gap in Cardiac Care

Nogueira, M. A.; Ferreira, F. C.; Batista, E.; Eira, S.; Proenca, G.; Matias, C.; Kecskes, I.

2026-05-15 cardiovascular medicine 10.64898/2026.05.12.26352129 medRxiv
Top 0.1%
14.2%
Show abstract

Objectives To assess agreement between Cardio-HART (CHART) and echocardiography for left ventricular ejection fraction (LVEF) estimation and heart failure (HF) classification in a real-world predominantly ischaemic cohort, while examining whether a point-of-care structural and functional assessment tool could reveal a broader workflow gap between the nominal availability of echocardiography and timely cardiac assessment in routine care. Design Prospective single-centre cohort study. Setting Secondary-care cardiology service at Cascais Hospital, Lisbon, Portugal. Participants Forty-seven adults referred for cardiology evaluation with suspected HF or followed in a hospital HF clinic. Primary and secondary outcome measures Agreement between CHART-derived and echocardiographic LVEF by Bland-Altman analysis; diagnostic performance for HF phenotypes; comparison with the Teichholz method. Results Mean age was 65.6+-15.9 years; 78.7% of participants had HF and 43.2% of HF cases were ischaemic. CHART showed a mean LVEF bias of +1.92% versus echocardiography, with 95% limits of agreement from -14.6% to +18.4% and a mean absolute error of 6.09%. Agreement was strongest in HF with reduced ejection fraction (HFrEF) and HF with mildly reduced ejection fraction (HFmrEF), and lower in HF with preserved ejection fraction (HFpEF). Diagnostic area under the curve for HFrEF classification was 0.89. Compared with the Teichholz method, CHART showed a lower root mean square error relative to Simpson's biplane LVEF. Conclusions CHART showed clinically credible performance for LVEF estimation and HF stratification, particularly in reduced-EF phenotypes. However, the most important finding of this study was not agreement alone. By performing credibly in a cardiology-based real-world setting, CHART exposed a previously under-recognised workflow gap between the nominal availability of echocardiography and timely access to structural cardiac assessment in routine care. The study therefore suggests that the value of CHART lies not only in diagnostic performance, but in making visible, and potentially narrowing, a hidden but consequential gap in cardiac assessment pathways. Larger studies are warranted, particularly for HFpEF and across broader clinical workflows.

11
AutoClip: AI-Guided TEE Semantic Segmentation for TEER A Proof-of-Concept Study

Chen, M.; Li, X.; Yang, K.; Taramasso, M.

2026-06-06 cardiovascular medicine 10.64898/2026.05.29.26354195 medRxiv
Top 0.1%
13.9%
Show abstract

**Abstract** **Background:** Transcatheter edge-to-edge repair (TEER) is an established treatment for mitral regurgitation but remains highly dependent on operator experience and complex transesophageal echocardiography (TEE)-guided intraprocedural imaging. Artificial intelligence (AI)-based semantic segmentation may improve procedural reproducibility and intraprocedural guidance; however, no TEER-specific segmentation framework has been reported. **Objectives:** To develop and evaluate AutoClip, a clinician-driven AI-guided TEE semantic segmentation model designed for simultaneous delineation of mitral valve anatomy and in-vivo TEER device components. **Methods:** A retrospective proof-of-concept study was conducted using 987 intraprocedural TEE frames derived from 10 video clips in 3 patients undergoing MitraClip G4 implantation. Seven semantic labels, including mitral leaflets and device components, were manually annotated using ITK-SNAP. Following standardized preprocessing and region-of-interest extraction, an Attention U-Net architecture was trained frame-wise on bicommissural and corresponding X-plane TEE views. Model performance was assessed using mean intersection-over-union (IoU) and Dice coefficient on an independent test set. **Results:** The Attention U-Net demonstrated improved sensitivity to small device structures compared with conventional U-Net architectures. Preliminary training performance achieved a mean IoU of approximately 0.93, while independent test performance reached a mean IoU of 0.46 across foreground classes. Qualitative assessment demonstrated feasible simultaneous segmentation of mitral leaflets, clip arms, grippers, and delivery shaft during TEER procedures. **Conclusions:** AutoClip represents a proof-of-concept TEER-specific TEE semantic segmentation framework initiated through a clinician-oriented workflow without formal computer science expertise. Although preliminary accuracy remains modest due to limited sample size, this study establishes a reproducible pathway for future AI-assisted intraprocedural guidance systems and larger multicenter development efforts in structural heart interventions.

12
Deep learning optimisation for cardiology: Neural Architecture Search-driven arrhythmia classification with electrocardiograms

Vanegas Mueller, E.; Joe-Oshodi, A.; Banerjee, A.; Villarroel, M.

2026-05-30 cardiovascular medicine 10.64898/2026.05.28.26354348 medRxiv
Top 0.1%
12.8%
Show abstract

Cardiovascular disease is the leading cause of death worldwide. Sudden cardiac death (SCD) accounts for roughly 50% of all cardiac deaths. The electrocardiogram (ECG) is widely used for early diagnosis of cardiac disease. However, the complexity of accurate interpretation limits the ECG's efficacy. Modern deep learning methods have been applied to assist clinicians in diagnosis. We applied Neural Architecture Search (NAS), an automated machine learning technique, to identify optimal deep learning architectures for classifying cardiac arrhythmias from ECGs. We applied the Differentiable Architecture Search strategy to an AutoFormer search space to identify optimal self-attention architectures for arrhythmia classification. We trained, validated, and tested the resulting model on the PhysioNet Challenge 2021 dataset (n = 88,253), comprising ECGs across three continents. We performed a hyperparameter optimisation on the NAS output, exploring input patch size, class weighting, and loss function. We evaluated performance using the PhysioNet Challenge metric and the area under the receiver operating characteristic curve (AUROC). The NAS converged towards minimal architectural configurations (embedding dimension: 384, depth: 4, self-attention heads: 4, MLP ratio: 1) with a validation challenge metric of 0.66 (PhysioNet Challenge 21 Winner: 0.63). The NAS-created network achieved an AUROC of 0.97 and a challenge metric of 0.71 during testing. Normal Sinus Rhythm and Sinus Tachycardia achieved AUROCs of 0.99. Low-QRS Voltage and T-wave abnormality were the worst-performing arrhythmias, with AUROCs of 0.89 and 0.90, respectively. We interpret that architectural simplicity drives performance in arrhythmia classification. Because SCD is unexpected, prevention strategies in free-living environments require lightweight computational resources suitable for wearable devices. Class imbalance fundamentally limits classification performance for rare arrhythmias such as Low-QRS Voltage and T-wave inversion, irrespective of hyperparameter choices. However, the self-attention mechanism can autonomously abstract clinical representations, simplifying clinical deployment by eliminating the need for an explicit feature-extraction pipeline.

13
Biomarker Signal Architecture in Cardiovascular Machine Learning: Stability, Redundancy, and Minimal High-Yield Panels After Myocardial Infarction

Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.

2026-05-22 cardiovascular medicine 10.64898/2026.05.19.26353638 medRxiv
Top 0.1%
10.3%
Show abstract

Background: Machine-learning models based on circulating biomarkers are increasingly used in cardiovascular research; however, model performance alone provides limited insight into how the predictive signal is distributed across features. We aimed to characterize the biomarker signal architecture of a machine-learning model distinguishing ST-elevation myocardial infarction (STEMI) from non-ST-elevation myocardial infarction (NSTEMI), with a focus on signal concentration, redundancy, and conditional complementarity. Methods: We conducted a structured secondary analysis of a previously established, leakage-controlled machine-learning framework (n = 152 patients). The BIOMARKERS feature-set variant (10 biomarkers) was evaluated using outer-fold cross-validation. Model structure was interrogated using (i) leave-one-biomarker-out analysis, (ii) pairwise leave-two-out analysis with pair-excess estimation, (iii) cumulative ablation of top-ranked biomarkers, and (iv) forward reconstruction of minimal biomarker panels. Uncertainty was assessed using bootstrap resampling across folds. Results: The full biomarker model achieved a mean ROC-AUC approaching 0.94. The predictive signal was highly non-uniform, with MMP-2 showing the largest single-feature contribution (mean {Delta}AUC {approx} 0.16). Pairwise analysis identified conditional complementarity between selected non-lipid biomarkers, particularly MMP-2 and EMMPRIN (pair {Delta}AUC {approx} 0.26; positive excess over single-feature effects), whereas lipid-related markers formed a highly correlated and largely redundant sub-cluster. Cumulative ablation demonstrated rapid performance collapse following removal of top-ranked biomarkers, consistent with structural signal concentration. Forward panel analysis showed that a compact subset of biomarkers (three features) achieved performance within ~0.01 ROC-AUC of the full model, indicating the presence of a minimal high-yield panel. Bootstrap confidence intervals suggested that small performance differences should be interpreted with caution. Conclusions: Predictive performance in this biomarker-based model arises from a structured and unevenly distributed signal architecture, characterized by a dominant core biomarker, conditionally complementary contributors, and a redundant lipid cluster. These findings highlight the importance of evaluating model structure, not only aggregate performance, and suggest that biomarker-based machine-learning systems may benefit from architecture-aware interpretation and simplification strategies.

14
Hidden risk in normal myocardial perfusion scans: AI-detected proximal coronary calcium on CT attenuation maps improves prognosis

Zhou, J.; Miller, R. J.; Shanbhag, A.; Killekar, A.; Han, D.; Patel, K. K.; Pieszko, K.; Yi, J.; Urs, M. K.; Ramirez, G.; Lemley, M.; Kavanagh, P. B.; Liang, J. X.; Kamagate, A.; Builoff, V.; Einstein, A. J.; Feher, A.; Miller, E. J.; Sinusas, A. J.; Ruddy, T. D.; Knight, S.; Le, V. T.; Mason, S.; Chareonthaitawee, P.; Wopperer, S.; Alexanderson, E.; Carvajal-Juarez, I.; Rosamond, T. L.; Slipczuk, L.; Travin, M. I.; Packard, R. R.; Acampa, W.; Al-Mallah, M.; deKemp, R. A.; Buechel, R. R.; Berman, D. S.; Dey, D.; Di Carli, M. F.; Slomka, P. J.

2026-04-15 cardiovascular medicine 10.64898/2026.04.14.26350808 medRxiv
Top 0.1%
10.1%
Show abstract

PurposeSpatial distribution of coronary artery calcium (CAC) may provide additional prognostic value in patients undergoing SPECT and PET myocardial perfusion imaging (MPI). We aimed to automatically identify CAC in proximal segments from attenuation correction CT (CTAC) scans using artificial intelligence (AI) and to evaluate prognostic significance in two large international multicenter registries. MethodsFrom hybrid MPI/CT imaging (N=43,099) across 15 sites, we included 4,552 most relevant patients with 1) no prior coronary artery disease; 2) AI-derived mild CAC scores (1-99); and 3) normal perfusion (stress total perfusion deficit <5%). The independent associations between AI-identified proximal CAC and major adverse cardiovascular events (MACE) and all-cause mortality (ACM) were evaluated using multivariable Cox regression, likelihood ratio test (LRT), and continuous net reclassification index (NRI). ResultsAmong the patients with mild CAC and normal perfusion (mean age 65{+/-}12 years, 51% male), 1,730 (38%) had proximal CAC. Over 3.6 (inter-quartile interval 2.1, 5.2) years follow-up, 599 (13%) and 444 (10%) patients had MACE or ACM, respectively. Proximal CAC was associated with an increased risk of MACE (adjusted hazard ratio [HR] 1.24, 95% CI 1.03-1.48, P=0.02) and ACM (adjusted HR 1.25, 95% CI 1.01-1.53, P=0.04) after the adjustment of CAC score and density, clinical risk factors, and perfusion deficit. Proximal CAC improved the risk stratification of MACE (LRT P=0.02; NRI 12%) and ACM (LRT P=0.04; NRI 12%). ConclusionIn patients with mild CAC and normal perfusion, AI detection of proximal CAC identified a higher-risk group for adverse outcomes, highlighting its prognostic utility. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=161 SRC="FIGDIR/small/26350808v1_ufig1.gif" ALT="Figure 1"> View larger version (68K): org.highwire.dtl.DTLVardef@1f489d9org.highwire.dtl.DTLVardef@18637ccorg.highwire.dtl.DTLVardef@b97275org.highwire.dtl.DTLVardef@1099c38_HPS_FORMAT_FIGEXP M_FIG C_FIG From patients who underwent hybrid myocardial perfusion imaging (MPI) from 15 sites, we analyzed those without prior coronary artery disease (CAD), mild coronary artery calcium (CAC) scores (1-99), and normal perfusion (stress total perfusion deficit <5%). A previously developed AI model was used to identify CAC lesions in proximal coronary segments on CT attenuation correction maps (CTAC). We evaluated associations with major adverse cardiovascular events (MACE) and all-cause mortality (ACM), showing risk stratification of proximal CAC and improvement by net reclassification index (NRI). CAC lesion color: green, left anterior descending artery (LAD) with left main artery; red, left circumflex artery (LCX); yellow, right coronary artery (RCA). Adjusted hazard ratios (HRs) are shown with 95% confidence intervals.

15
Estimation of Physiological Metrics from Resting ECGs Using Deep Learning in the UK Biobank, Including submaximal exercise derived VO2max, Body Fat Percentage, and Grip Strength

Mankowski, I.; Pinter, E.; Lee, I.-M.; Raetsch, G.; Demler, O.

2026-05-13 cardiovascular medicine 10.64898/2026.05.09.26352818 medRxiv
Top 0.1%
10.0%
Show abstract

Maximal oxygen consumption [Formula] is the gold standard for cardiorespiratory fitness but requires resource-intensive physical testing. Recent reports show that machine learning models can extract additional information from ECGs, yet the potential of ECG as a source of physiological metrics remains underutilized. While routinely collected resting electrocardiograms (ECG) provide an opportunistic window into cardiorespiratory fitness, current deep learning models often struggle with cross-cohort transferability or remain dependent on active exercise data. We developed population specific models using the UK Biobank to estimate submaximal exercise derived [Formula](N = 8,540) and a panel of other physiological metrics (sample sizes up to N = 78,265) from resting 12-lead ECGs using Patient Contrastive Learning of Representations (PCLR), an AI based tool that converts ECG into a set of 320 features (ECG-PCLR). Data were split 80%:20% (training:test) and models were evaluated on a set-aside test subset. We demonstrate that ECG-PCLR embeddings alone can estimate submaximal [Formula] and body fat percentage with Pearson correlations (r) of 0.61 and 0.65, respectively. They also estimate systolic blood pressure, forced expiratory volume in 1 second (FEV1), and grip strength with r values from 0.31 to 0.55. Adding ECG embeddings to basic predictors (age, sex and BMI) improves submaximal [Formula] prediction by an absolute {Delta}R2 of 8% and by 1% to 13% for other physiologic parameters.

16
AI-driven selection of patients with non-valvular atrial fibrillation for oral anticoagulation therapy: a multi-cohort validation and impact evaluation study

Rao, S.; Walli-Attaei, M.; Ahmed, N.; Fan, Z.; Petrazzini, B.; Lian, J.; Ghamari, S.; Wamil, M.; Lip, G. Y. H.; Leal, J.; Rahimi, K.

2026-03-25 cardiovascular medicine 10.64898/2026.03.23.26349067 medRxiv
Top 0.1%
10.0%
Show abstract

Background: Current risk assessment tools for guiding direct oral anticoagulant (DOAC) therapy for patients with atrial fibrillation (AF) based on clinical risk factors demonstrate modest predictive performance limiting clinical impact. Additionally, while guidelines recommend periodic reassessment of risk over time, there remains an absence of modelling solutions for capturing evolving risk in AF patients. Methods: Using UK electronic health records, we developed and validated the Transformer-based Risk assessment survival model (TRisk), an artificial intelligence model that predicts 12-month thromboembolic and bleeding events in AF patients by leveraging temporal patient journeys up to baseline. A cohort of 411,850 prevalent non-valvular AF patients aged [&ge;]18 years between 2010 and 2020 was identified from 1,442 English general practices. Practices were randomly allocated to derivation (n=1,079) and external validation (n=363) cohorts. TRisk was compared with CHA2DS2-VASc and CHA2DS2-VA for thromboembolic event prediction, and HAS-BLED and ORBIT for bleeding prediction, with subgroup analyses by sex, age, and baseline characteristics. A second validation of TRisk was also performed on 16,218 US AF patients between 2010 and 2023. A decision model compared outcomes and healthcare costs for TRisk versus standard care. Findings: TRisk achieved higher discrimination for thromboembolic event prediction (C-index: 0.82; 95% confidence interval [CI]: [0.81, 0.83]) as compared to CHA2DS2-VASc (0.71 [0.70, 0.73]) in UK validation. Application of TRisk to US data yielded similar C-index: 0.82 (0.80, 0.84). For bleeding prediction, TRisk (C-index: 0.70 [0.69-0.71]) outperformed both HAS-BLED (0.63; [0.61, 0.64]) and ORBIT (0.64; [0.63, 0.65]), with comparable US results (0.71; [0.69, 0.74]). The model remained well-calibrated across both populations and performed equitably across subgroups, including by race and during the COVID-19 pandemic. Impact analyses showed TRisk could reduce DOAC prescriptions by 8% in the UK and 7% in the US relative to guideline-recommended approaches, while preventing at least as many thromboembolic events. This refined approach would generate annual healthcare savings of GBP 5.5 million and USD 456.2 million in the UK and US respectively among patients initiating DOACs, rising to GBP 48.6 million and USD 1.8 billion when extended to all AF patients on DOACs. Interpretation: TRisk enabled more precise prediction for both thromboembolic and bleeding events across AF populations in UK and US compared to established clinical scoring systems. Incorporating TRisk into routine AF care would result in substantial cost savings without compromising the identification of true high-risk patients. Funding: None

17
DIVAID: Consistent division of atrial geometries from multimodal imaging according to the EHRA/EACVI 15-segment bi-atrial model

Goetz, C.; Eichenlaub, M.; Schmidt, K.; Wiedmann, F.; Invers Rubio, E.; Martinez Diaz, P.; Luik, A.; Althoff, T.; Schmidt, C.; Loewe, A.

2026-04-23 cardiovascular medicine 10.64898/2026.04.22.26351448 medRxiv
Top 0.1%
9.8%
Show abstract

The recently published EHRA/EACVI consensus statement on a standardized bi-atrial regionalization provides new opportunities for consistent regional analyses across patients, imaging modalities and clinical centers. To make this standardized regionalization widely accessible, we developed the open-source software DIVAID, which automatically divides bi-atrial geometries according to the proposed regions, ensuring consistency, reproducibility and operator independence. We evaluated the accuracy of the algorithm by comparing its results to manual expert annotations across 140 geometries from multiple modalities and centers. Veins were automatically clipped correctly in 81% and orifices annotated correctly in 100 % of cases. The median (interquartile range; IQR) Dice similarity coefficient (DSC) for left atrial regions was 0.98 (0.96 - 1.00) for DIVAID-expert and 0.98 (0.94 - 1.00) for inter-expert comparisons. For right atrial geometries, DSC was higher for DIVAID-expert than for inter-expert comparisons at 0.90 (0.80 - 0.95) and 0.88 (0.74 - 0.94), respectively. To assess the accuracy of regional boundaries, we computed the mean average surface distance (MASD) for boundaries derived from automatic or manual annotations. The median (IQR) MASD between DIVAID and experts was 0.17 mm (0.03 - 0.78) and 1.93 mm (0.65 - 3.96) in the left and right atrium, respectively. To conclude, DIVAID robustly divides anatomically diverse bi-atrial geometries according to the 15-segment model, while outperforming cardiac experts in both speed and consistency, and demonstrating an accuracy of regional boundaries comparable to the spatial resolution of cardiac imaging modalities. By providing automated, consistent atrial regionalization, DIVAID enables large-scale, standardized regional analyses and data-driven investigation of harmonized, multi-dimensional datasets, which may advance atrial arrhythmia research and personalized treatment strategies.

18
Integrated Right-Heart Remodeling Phenotypes and Prognosis in Tricuspid Regurgitation: An Automated Strain Echocardiography Study

Park, J.; Kwak, S.; Yoon, Y. E.; Park, J.-B.; Kim, J.; Jeon, J.; Jang, Y.; Lee, S.-A.; Bak, M.; Choi, H.-M.; Hwang, I.-C.; Lee, S.-P.; Kim, H.-K.; Kim, Y.-J.; Cho, G.-Y.

2026-06-01 cardiovascular medicine 10.64898/2026.05.28.26354377 medRxiv
Top 0.1%
9.8%
Show abstract

Background: Echocardiographic assessment of tricuspid regurgitation (TR) remains valve-centric, and right-heart remodeling is not captured. Strain parameters carry prognostic value but are evaluated in isolation. Objectives: To develop integrated right atrial (RA) and right ventricular (RV) remodeling indices using automated echocardiography and assess their utility for TR severity grading, phenotyping, and prognostic stratification. Methods: We analyzed 8,231 patients with functional TR (mild-or-greater) from two tertiary centers (2023-2024) using an automated AI-based echocardiographic solution. The RA remodeling index (RA reservoir strain/RA volume index) and RV remodeling index (RV free wall strain/RV end-diastolic area) were derived automatically; patients were classified into four RA-RV remodeling phenotypes. The primary outcome was all-cause death or heart failure (HF) hospitalization. Results: During median follow-up of 19.3 months, the primary outcome occurred in 574 patients (7.0%). Both indices outperformed individual components for severe TR discrimination (RA: AUC 0.857 vs. 0.757; RV: 0.710 vs. 0.601; both P<0.05). After multivariate adjustment, the RA (HR per unit decrease, 1.27; 95% CI, 1.09-1.49; P=0.002) and RV remodeling indices (2.32; 1.76-3.06; P<0.001) were independently associated with the primary outcome; on mutual adjustment, only the RV index retained significance and provided incremental prognostic value ({Delta}C-index +0.010; NRI +0.237; both P<0.05). The four phenotypes showed progressively divergent risk (log-rank P<0.001), with combined remodeling (Low RA/Low RV) carrying the highest risk. Conclusions: Automated integrated RA and RV remodeling indices improved TR severity discrimination and enabled clinically meaningful right-heart phenotyping. The RV index conferred incremental prognostic value, whereas the RA index better reflected atrial-stage remodeling and disease burden.

19
Early Prediction of Post-TAVR Left Ventricular Remodeling Using CT-Derived Radiomics and Clinical Variables

Rezaeitaleshmahalleh, M.; Masoumi, S.; Razaviamri, F.; Rouhollahi, A.; Zancanaro, E.; Danesi, T. H.; Ayers, B. C.; Jassar, A.; Sabe, A.; Nezami, F. R.

2026-06-02 cardiovascular medicine 10.64898/2026.05.28.26354361 medRxiv
Top 0.1%
8.9%
Show abstract

Background: Adverse left ventricular (LV) remodeling after transcatheter aortic valve replacement (TAVR) is associated with impaired functional recovery and adverse long-term outcomes, yet imaging-based risk stratification remains limited. Objectives: This study sought to determine whether CT-derived radiomic and geometric myocardial features, integrated with procedural and clinical variables, can predict adverse LV remodeling after TAVR. Methods: We retrospectively analyzed 232 consecutive TAVR recipients with paired pre- and post-procedural LV mass index (LVMI) measurements. Adverse remodeling was defined as a [&ge;]10% increase in LVMI at follow-up. Pre-procedural CT was used to derive three-dimensional LV geometric descriptors, ray-tracing wall-thickness metrics, and myocardial texture radiomic features. Random forest classifiers were developed across six models of sequentially increasing complexity. Results: Adverse LV remodeling occurred in 52 patients (22.4%). Geometry-only model showed limited discrimination (AUC 0.62), whereas wall-thickness radiomics substantially improved performance (AUC 0.84). A multimodal pre-procedural model combining CT radiomics with pre-procedural LVMI, residual valve insufficiency, and prior coronary revascularization achieved an AUC of 0.86 (95% CI 0.73 to 0.98). Addition of post-procedural mean transvalvular gradient further improved discrimination (AUC 0.91, 95% CI 0.81 to 0.98). SHAP analysis identified post-procedural mean aortic gradient and radiomic markers of myocardial heterogeneity as the leading predictors. Conclusions: CT-derived radiomic characterization of myocardial heterogeneity provides incremental prognostic information beyond conventional geometric assessment for identifying patients at risk of adverse LV remodeling after TAVR. These findings extend the role of pre-procedural CT beyond anatomical planning toward quantitative myocardial phenotyping and individualized risk stratification, although prospective validation is required to establish clinical utility.

20
Explainable Advanced Electrocardiography Heart Age Shows Good Reproducibility in Healthy Young Adults

Warrington, C. R.; Al-Falahi, Z.; Premawardhana, U.; Ugander, M.; Green, S.

2026-03-25 cardiovascular medicine 10.64898/2026.03.24.26349147 medRxiv
Top 0.1%
8.3%
Show abstract

Aims: Explainable advanced electrocardiography (A-ECG) can be used to estimate heart age from the standard 12-lead ECG. A-ECG heart age gap (HAG) represents the difference between A-ECG heart age and chronological age. Increased A-ECG HAG is associated with cardiovascular outcomes and can be used to communicate risk. The aim was to investigate whether A-ECG heart age demonstrates acceptable within- and between-session reproducibility. Methods: Healthy adults (n=42, age 23+/-4 years, 52% male) attended up to two sessions ~14 days apart, with 36 participants completing both sessions. During each session, five standard resting 12-lead ECGs were obtained while lying in the supine position with unchanged electrode positions. A-ECG heart age was extracted using dedicated software. Within-session reproducibility was assessed using all five recorded ECGs with coefficient of variation (CV) and a two-way random effects intraclass correlation coefficient (ICC). Between-session reproducibility was assessed using the first recorded ECG of each session with a paired t-test, CV and ICC. A further analysis assessed the reproducibility of the parameters used in the A-ECG heart age regression model. Results: A-ECG heart age showed excellent within-session reproducibility in session one and two (both CV 5.8%, ICC 0.99). A-ECG heart age was slightly lower in session one than two (24.0+/-7.5 vs. 25.5+/-7.8 years, p=0.04) and showed good between-session reproducibility (CV 8.3%, ICC 0.84). All but one parameter used to estimate A-ECG heart age showed acceptable within- and between-session reproducibility (CV<10%). Conclusion: A-ECG heart age demonstrates excellent within-session reproducibility and good between-session reproducibility in healthy young adults.